feat: implement grading strategies for peer evaluations #2196

mariajgrimaldi · 2024-03-21T16:46:20Z

TL;DR - This PR adds a new grading calculation strategy that can be configured in the assessment steps configurations for peer steps instead of the median calculation.

Currently, for steps that calculate a final score based on multiple scores (e.g. peer steps) do so using the median by default; the calculation is done here. Using the median for this use case has its advantages. However, using it by default without reconfigurability options removes other use cases that can be done using the mean calculation for the final score instead. This also opens the door for other grading strategies mentioned in this thread.

That's why this product proposal was created and later approved.

What changed?

The students' score is obtained before setting the score here
Each step API has its own get_score function
The peer step get_score gets the median score of all scores the student has at the moment of generation by using get_assessments_median_scores
The get_assessments_median_scores calls get_median_scores_dict, which calculates the final score for the student using the median.

So we can make the grading strategy configurable we:

Changed each reference to get_assessments_median_scores for a more generic get_assessments_peer_scores_with_grading_strategy which discriminates based on the workflow configuration which calculation to use. If the feature flag ENABLE_ORA_PEER_CONFIGURABLE_GRADING is off, then the default get_assessments_median_scores is used.

By default, the median calculations are used.

Developer Checklist

Reviewed the release process
Translations and JS/SASS compiled
Bumped version number in openassessment/__init__.py and package.json

Testing Instructions

Clone this branch into your file system
Install edx-ora2 as an editable requirement (-e optional)
Now, add an ORA component to your course.
Turn on the feature by adding: FEATURES["ENABLE_ORA_PEER_CONFIGURABLE_GRADING"] = True
Configure the steps as follows so these tests instructions make sense:
Use just one rubric configured with these option points:
The median and the mean look alike when the dataset is not skewed. So, so we can check we're using the mean instead of the median, let's use the following scoring sequence:
Student 1 submits the ORA activity with a distinctive answer so we can distinguish it later
Student 2 scores Student 1 with 0
Student 3 scores Student 1 with 1
Student 4 scores Student 1 with 5
Now, the median is: 1 but the mean is: 2. So check that grading matches the mean value and the explanation says mean instead of median.

If you don't configure the mean grading then the default behavior is kept:

Reviewer Checklist

Collectively, these should be completed by reviewers of this PR:

I've done a visual code review
I've tested the new functionality

openedx-webhooks · 2024-03-21T16:46:24Z

Thanks for the pull request, @mariajgrimaldi! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

supporting documentation
Open edX discussion forum threads
timeline information ("this must be merged by XX date", and why that is)
partner information ("this is a course on edx.org")
any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

openassessment/assessment/api/peer.py

codecov · 2024-03-26T00:09:32Z

Codecov Report

Attention: Patch coverage is 95.33333% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 95.06%. Comparing base (51d84c1) to head (69bc17a).

Files	Patch %	Lines
openassessment/xblock/utils/validation.py	16.66%	5 Missing ⚠️
openassessment/xblock/utils/xml.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #2196    +/-   ##
========================================
  Coverage   95.05%   95.06%            
========================================
  Files         193      193            
  Lines       21149    21285   +136     
  Branches     1906     1918    +12     
========================================
+ Hits        20104    20235   +131     
- Misses        780      785     +5     
  Partials      265      265

Flag	Coverage Δ
unittests	`95.06% <95.33%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

openassessment/assessment/models/base.py

mariajgrimaldi · 2024-03-26T20:11:21Z

I still need to generate all the translations so tests don't fail. Meanwhile, I'll open the PR for review. Thank you!

openassessment/assessment/models/base.py

openassessment/xblock/grade_mixin.py

openassessment/templates/legacy/edit/oa_edit_peer_assessment.html

mariajgrimaldi

@BryanttV: I think I addressed all your comments. Could you review again?

I still need to fix the JS tests, so I'll work on that. Thanks!

BryanttV · 2024-04-04T14:34:35Z

Hi @mariajgrimaldi, I have tested again and everything seems to work correctly. Only a minor change would be needed. Thank you!

openassessment/assessment/models/base.py

mariajgrimaldi · 2024-05-07T14:39:00Z

Hello, @pomegranited! I hope you're doing well. Could you give us a hand reviewing this PR? We'd appreciate it.

Regarding the test failures, I'll be working on increasing the PR coverage this week. I hope to update the PR soon. Thanks!

This reverts commit cbb7a86. Revert "temp: remove forcing default value for grading strategy" This reverts commit 2a71705.

pomegranited

Hi @mariajgrimaldi , thank you for this feature! I think it'll be useful for content authors.

I noted a few minor nits, and had a question about rounding, but nothing that's blocking merge. If you agree with my suggested changes, could you apply them and do a version bump? I can merge tomorrow.

👍

I tested this on my tutor dev stack with:
- the flag disabled (by default) -- no option provided, defaulted to median
- the flag enabled -- options provided, changed to "mean", was not able to change after block was published
I read through the code
I checked for accessibility issues by using my keyboard to navigate the ORA editor.
Includes documentation
User-facing strings are extracted for translation

openassessment/assessment/api/peer.py

openassessment/assessment/models/base.py

openassessment/assessment/test/test_peer.py

openassessment/xblock/grade_mixin.py

pomegranited · 2024-05-08T02:20:53Z

openassessment/assessment/models/base.py

+        total_criterion_scores = len(scores)
+        if total_criterion_scores == 0:
+            return 0
+        return math.ceil(sum(scores) / float(total_criterion_scores))


I got an unexpected mean score:

(it's the second default rubric, so poor=0, fair=1, good=3, excellent=5):
1 + 0 + 3 + 1 + 5 = 10, 10/5 =2

Is it because there's no 2 in the rubric, so it rounds up to the next one 3?

This threw me for a loop. So, I tried replicating the same scenario. Here's the criteria I used:
Ideas: poor=0, fair=3, good=5
Content: poor=0, fair=1, good=3, excellent=5
Graded by 5 configuration
And got:

While researching, I found that the assessments are retrieved, ordered, and then truncated by the number of peer grades needed. Could you confirm that the number of peer grades configured while testing was 5?

I also added an extra test here with the exact values: https://github.com/openedx/edx-ora2/pull/2196/files#diff-7fe9b2681acaf897f2dd62a13553badf288fd6144832ce0562c4a58bd2640e7fR2381

I also reproduced the same scenario with the graded-by-3 configurations and got 2 since the 1st three peers graded 1, 0, and 3 in that order. Please let me know if the submission order also affected your tests.

Oh wow, submission order does affect the final score! That's annoying.

I also added an extra test here with the exact values: https://github.com/openedx/edx-ora2/pull/2196/files#diff-7fe9b2681acaf897f2dd62a13553badf288fd6144832ce0562c4a58bd2640e7fR2381

Thanks for that -- I also ran this test with the numbers shuffled in a different order, to confirm that it's not the "mean score" logic that's causing this issue. It must be something done after that.

--- a/openassessment/assessment/test/test_peer.py +++ b/openassessment/assessment/test/test_peer.py @@ -2375,6 +2375,7 @@ class TestPeerApi(CacheResetTest): self.assertEqual(31, Assessment.get_mean_score([5, 6, 12, 16, 22, 53, 102])) self.assertEqual(31, Assessment.get_mean_score([16, 6, 12, 102, 22, 53, 5])) self.assertEqual(2, Assessment.get_mean_score([0, 1, 3, 1, 5])) + self.assertEqual(2, Assessment.get_mean_score([5, 3, 1, 1, 0]))

Could you confirm that the number of peer grades configured while testing was 5?

Yes, 5 peer grades were required, and only 5 were received.

I used the default graded-by-5 configuration, with
Ideas: poor=0, fair=3, good=5
Content: poor=0, fair=1, good=3, excellent=5

Audit user received the same "Content" peer assessments as Honor, but in a different order, and so they got different scores. (Apologies about the "Ideas" assessments, I thought I did them all the same too, but apparently not.)

Audit received:

Honor received:

pomegranited · 2024-05-09T00:43:33Z

@mariajgrimaldi I've re-tested with your changes and, with the exception of the ordering issue discussed above, I found no other issues.

Redwood gets cut soon, so I'm happy to merge this now, and then you can do a follow-up PR to fix the ordering bug? We'll also need a version bump to get this into edx-platform:

Bumped version number in openassessment/init.py and package.json

mariajgrimaldi · 2024-05-09T01:00:58Z

@pomegranited: thank you again for such a detailed review! We appreciate it. I just bumped version number in openassessment/init.py and package.json, and I'll be researching further the issue with the grading ordering in the next few days. I'll let you know what I find!

openedx-webhooks · 2024-05-09T01:09:06Z

@mariajgrimaldi 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

pomegranited · 2024-05-09T01:13:18Z

Awesome, thank you for your rapid reply @mariajgrimaldi ! ora2==6.10.0 is released and ready for you to do a version bump PR in edx-platform if you want.

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 21, 2024

mariajgrimaldi commented Mar 21, 2024

View reviewed changes

openassessment/assessment/api/peer.py Outdated Show resolved Hide resolved

mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch from df6bf6a to d2bf397 Compare March 21, 2024 21:49

itsjeyd added the core contributor PR author is a Core Contributor (who may or may not have write access to this repo). label Mar 22, 2024

mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch from 94a83e3 to dd73e24 Compare March 25, 2024 17:59

mariajgrimaldi commented Mar 26, 2024

View reviewed changes

openassessment/assessment/models/base.py Show resolved Hide resolved

mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch 2 times, most recently from 49a679f to 78549ab Compare March 26, 2024 19:50

mariajgrimaldi marked this pull request as ready for review March 26, 2024 20:10

mariajgrimaldi requested a review from a team as a code owner March 26, 2024 20:10

mariajgrimaldi changed the title ~~[WIP] feat: implement mean score strategy~~ feat: implement grading strategies for peer evaluations Mar 26, 2024

BryanttV reviewed Apr 2, 2024

View reviewed changes

openassessment/assessment/models/base.py Outdated Show resolved Hide resolved

openassessment/assessment/models/base.py Outdated Show resolved Hide resolved

openassessment/assessment/models/base.py Outdated Show resolved Hide resolved

BryanttV reviewed Apr 2, 2024

View reviewed changes

openassessment/assessment/models/base.py Outdated Show resolved Hide resolved

BryanttV reviewed Apr 2, 2024

View reviewed changes

openassessment/assessment/models/base.py Show resolved Hide resolved

openassessment/xblock/grade_mixin.py Show resolved Hide resolved

BryanttV reviewed Apr 2, 2024

View reviewed changes

openassessment/templates/legacy/edit/oa_edit_peer_assessment.html Show resolved Hide resolved

mariajgrimaldi commented Apr 2, 2024

View reviewed changes

BryanttV reviewed Apr 4, 2024

View reviewed changes

openassessment/assessment/models/base.py Outdated Show resolved Hide resolved

mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch from 0c1362b to a6bae46 Compare April 4, 2024 15:38

itsjeyd added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 12, 2024

mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch 2 times, most recently from aa96648 to 436ffce Compare April 18, 2024 18:38

mariajgrimaldi added 4 commits May 7, 2024 14:50

feat: implement mean score strategy

85bf487

refactor: re-generate statics for implementation

eb4b911

fix: address test failures

9b4d8e2

feat: generate translations in english

c3c73c2

mariajgrimaldi added 7 commits May 7, 2024 14:50

refactor: add missing defaults for tests

fcc3c8b

refactor: set peerAssessmentView grading strategy

3936e2d

refactor!: address PR reviews

9351c35

feat: re-generate translations

3a3f841

refactor: address PR review

0cd23e1

Revert "refactor: add missing defaults for tests"

a31f654

This reverts commit cbb7a86. Revert "temp: remove forcing default value for grading strategy" This reverts commit 2a71705.

chore: update translations to latest

ced373a

mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch from 436ffce to ced373a Compare May 7, 2024 18:59

mariajgrimaldi added 8 commits May 7, 2024 16:00

chore: generate latest statics

a955bde

test: add test cases for message rendering in grade details

99a8bc1

docs: update docstrings with better descriptions

4101928

refactor: get peer_median_default instead of peer_median

f4da665

fix: add user_id to scenario decorator

d323dc9

refactor: test only peer-steps instead of peer-self

0d8975b

fix: match must_grade/must_be_graded_by to test setup

7c97379

fix: get peer_median_default instead of peer_median

b1df638

pomegranited approved these changes May 8, 2024

View reviewed changes

mariajgrimaldi added 6 commits May 8, 2024 15:48

refactor: address PR reviews

e507ab5

refactor: use new method instead of median scores getter

61f952c

fix: remove extra unnecessary argument

9ba4f80

fix: rewrite tests according latest feedback/changes

4c6f474

fix: use correct feature dicts

97ff348

refactor: address test failures

da58dc8

docs: update version for upcoming release

69bc17a

pomegranited merged commit c2dc989 into openedx:master May 9, 2024
11 checks passed

itsjeyd removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement grading strategies for peer evaluations #2196

feat: implement grading strategies for peer evaluations #2196

mariajgrimaldi commented Mar 21, 2024 •

edited

Loading

openedx-webhooks commented Mar 21, 2024 •

edited

Loading

codecov bot commented Mar 26, 2024 •

edited

Loading

mariajgrimaldi commented Mar 26, 2024 •

edited

Loading

mariajgrimaldi left a comment

BryanttV commented Apr 4, 2024

mariajgrimaldi commented May 7, 2024

pomegranited left a comment

pomegranited May 8, 2024

mariajgrimaldi May 8, 2024 •

edited

Loading

mariajgrimaldi May 8, 2024

mariajgrimaldi May 8, 2024

pomegranited May 9, 2024 •

edited

Loading

pomegranited commented May 9, 2024

mariajgrimaldi commented May 9, 2024

openedx-webhooks commented May 9, 2024

pomegranited commented May 9, 2024

feat: implement grading strategies for peer evaluations #2196

feat: implement grading strategies for peer evaluations #2196

Conversation

mariajgrimaldi commented Mar 21, 2024 • edited Loading

openedx-webhooks commented Mar 21, 2024 • edited Loading

codecov bot commented Mar 26, 2024 • edited Loading

Codecov Report

mariajgrimaldi commented Mar 26, 2024 • edited Loading

mariajgrimaldi left a comment

Choose a reason for hiding this comment

BryanttV commented Apr 4, 2024

mariajgrimaldi commented May 7, 2024

pomegranited left a comment

Choose a reason for hiding this comment

pomegranited May 8, 2024

Choose a reason for hiding this comment

mariajgrimaldi May 8, 2024 • edited Loading

Choose a reason for hiding this comment

mariajgrimaldi May 8, 2024

Choose a reason for hiding this comment

mariajgrimaldi May 8, 2024

Choose a reason for hiding this comment

pomegranited May 9, 2024 • edited Loading

Choose a reason for hiding this comment

pomegranited commented May 9, 2024

mariajgrimaldi commented May 9, 2024

openedx-webhooks commented May 9, 2024

pomegranited commented May 9, 2024

mariajgrimaldi commented Mar 21, 2024 •

edited

Loading

openedx-webhooks commented Mar 21, 2024 •

edited

Loading

codecov bot commented Mar 26, 2024 •

edited

Loading

mariajgrimaldi commented Mar 26, 2024 •

edited

Loading

mariajgrimaldi May 8, 2024 •

edited

Loading

pomegranited May 9, 2024 •

edited

Loading